Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Genome Biol Evol ; 16(3)2024 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-38502059

RESUMO

Siphonophores (Cnidaria: Hydrozoa) are abundant predators found throughout the ocean and are important constituents of the global zooplankton community. They range in length from a few centimeters to tens of meters. They are gelatinous, fragile, and difficult to collect, so many aspects of the biology of these roughly 200 species remain poorly understood. To survey siphonophore genome diversity, we performed Illumina sequencing of 32 species sampled broadly across the phylogeny. Sequencing depth was sufficient to estimate nuclear genome size from k-mer spectra in six specimens, ranging from 0.7 to 2.3 Gb, with heterozygosity estimates between 0.69% and 2.32%. Incremental k-mer counting indicates k-mer peaks can be absent with nearly 20× read coverage, suggesting minimum genome sizes range from 1.4 to 5.6 Gb in the 25 samples without peaks in the k-mer spectra. This work confirms most siphonophore nuclear genomes are large relative to the genomes of other cnidarians, but also identifies several with reduced size that are tractable targets for future siphonophore nuclear genome assembly projects. We also assembled complete mitochondrial genomes for 33 specimens from these new data, indicating a conserved gene order shared among nonsiphonophore hydrozoans, Cystonectae, and some Physonectae, revealing the ancestral mitochondrial gene order of siphonophores. Our results also suggest extensive rearrangement of mitochondrial genomes within other Physonectae and in Calycophorae. Though siphonophores comprise a small fraction of cnidarian species, this survey greatly expands our understanding of cnidarian genome diversity. This study further illustrates both the importance of deep phylogenetic sampling and the utility of k-mer-based genome skimming in understanding the genomic diversity of a clade.


Assuntos
Cnidários , Genoma Mitocondrial , Hidrozoários , Animais , Cnidários/genética , Filogenia , Hidrozoários/genética , Genômica , Tamanho do Genoma
2.
Bioinform Adv ; 3(1): vbad154, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37904893

RESUMO

Motivation: Presenting the integrated results of bioinformatics research can be challenging and requires sophisticated visualization components, which can be time-consuming to develop. This article presents a new way to effectively communicate research findings. Results: We have developed a static web page generator, JSONWP, which is specifically designed for protein bioinformatics research. Utilizing React (a JavaScript library used to build interactive and dynamic user interfaces for web applications), we have integrated publicly available bioinformatics visualization components to provide standardized access to these components. JSON (or JavaScript Object Notation, is a lightweight textual data format often used to structure and exchange information between different software tools.) is used as the input source due to its ability to represent nearly all types of data using key and value pairs. This allows researchers to use their preferred programming language to create a JSON representation, which can then be converted into a website by JSONWP. No server or domain is required to host the website, as only the publicly accessible JSON file is required. Conclusions: Overall, JSONWP provides a useful new tool for bioinformatics researchers to effectively communicate their findings. The open-source implementation is located at https://github.com/MesihK/react-json-wpbuilder, and the tool can be used at jsonwp.onrender.com.

3.
Front Bioinform ; 3: 1227193, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37900964

RESUMO

Understanding protein sequences and how they relate to the functions of proteins is extremely important. One of the most basic operations in bioinformatics is sequence alignment and usually the first things learned from these are which positions are the most conserved and often these are critical parts of the structure, such as enzyme active site residues. In addition, the contact pairs in a protein usually correspond closely to the correlations between residue positions in the multiple sequence alignment, and these usually change in a systematic and coordinated way, if one position changes then the other member of the pair also changes to compensate. In the present work, these correlated pairs are taken as anchor points for a new type of sequence alignment. The main advantage of the method here is its combining the remote homolog detection from our method PROST with pairwise sequence substitutions in the rigorous method from Kleinjung et al. We show a few examples of some resulting sequence alignments, and how they can lead to improvements in alignments for function, even for a disordered protein.

4.
Biophys J ; 122(15): 3069-3077, 2023 08 08.
Artigo em Inglês | MEDLINE | ID: mdl-37345249

RESUMO

Cadherin intermolecular interactions are critical for cell-cell adhesion and play essential roles in tissue formation and the maintenance of tissue structures. In this study, we focus on E-cadherin, a classical cadherin that connects epithelial cells, to understand how they interact in cis and trans conformations when attached to the same cell or opposing cells. We employ coevolutionary sequence analysis and molecular dynamics simulations to confirm previously known interaction sites as well as to identify new interaction sites. The sequence coevolutionary results yield a surprising result indicating that there are no strongly favored intermolecular interaction sites, which is unusual and suggests that many interaction sites may be possible, with none being strongly preferred over others. By using molecular dynamics, we test the persistence of these interactions and how they facilitate adhesion. We build several types of cadherin assemblages, with different numbers and combinations of cis and trans interfaces to understand how these conformations act to facilitate adhesion. Our results suggest that, in addition to the established interaction sites on the EC1 and EC2 domains, an additional plausible cis interface at the EC3-EC5 domain exists. Furthermore, we identify specific mutations at cis/trans binding sites that impair adhesion within E-cadherin assemblages.


Assuntos
Caderinas , Sítios de Ligação , Caderinas/química , Caderinas/metabolismo , Adesão Celular , Mutação , Ligação Proteica , Animais , Camundongos
5.
Proc Natl Acad Sci U S A ; 120(9): e2211823120, 2023 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-36827259

RESUMO

There are several hundred million protein sequences, but the relationships among them are not fully available from existing homolog detection methods. There is an essential need for an improved method to push homolog detection to lower levels of sequence identity. The method used here relies on a language model to represent proteins numerically in a matrix (an embedding) and uses discrete cosine transforms to compress the data to extract the most essential part, significantly reducing the data size. This PRotein Ortholog Search Tool (PROST) is significantly faster with linear runtimes, and most importantly, computes the distances between pairs of protein sequences to yield homologs at significantly lower levels of sequence identity than previously. The extent of allosteric effects in proteins points out the importance of global aspects of structure and sequence. PROST excels at global homology detection but not at detecting local homologs. Results are validated by strong similarities between the corresponding pairs of structures. The number of remote homologs detected increased significantly and pushes the effective sequence matches more deeply into the twilight zone. Human protein sequences presently having no assigned function now find significant numbers of putative homologs for 93% of cases and structurally verified assigned functions for 76.4% of these cases. The data compression enables massive searches for homologs with short search times while yielding significant gains in the numbers of remote homologs detected. The method is sufficiently efficient to permit whole-genome/proteome comparisons. The PROST web server is accessible at https://mesihk.github.io/prost.


Assuntos
Compressão de Dados , Proteoma , Humanos , Sequência de Aminoácidos , Ferramenta de Busca , Genoma , Bases de Dados de Proteínas
6.
J Phys Chem B ; 127(9): 1914-1921, 2023 03 09.
Artigo em Inglês | MEDLINE | ID: mdl-36848294

RESUMO

The sequence correlations within a protein multiple sequence alignment are routinely being used to predict contacts within its structure, but here we point out that these data can also be used to predict a protein's dynamics directly. The elastic network protein dynamics models rely directly upon the contacts, and the normal modes of motion are obtained from the decomposition of the inverse of the contact map. To make the direct connection between sequence and dynamics, it is necessary to apply coarse-graining to the structure at the level of one point per amino acid, which has often been done, and protein coarse-grained dynamics from elastic network models has been highly successful, particularly in representing the large-scale motions of proteins that usually relate closely to their functions. The interesting implication of this is that it is not necessary to know the structure itself to obtain its dynamics and instead to use the sequence information directly to obtain the dynamics.


Assuntos
Aminoácidos , Proteínas , Conformação Proteica , Modelos Moleculares , Proteínas/química , Movimento (Física)
8.
Proteins ; 89(6): 671-682, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33469973

RESUMO

Protein sequence matching presently fails to identify many structures that are highly similar, even when they are known to have the same function. The high packing densities in globular proteins lead to interdependent substitutions, which have not previously been considered for amino acid similarities. At present, sequence matching compares sequences based only upon the similarities of single amino acids, ignoring the fact that in densely packed protein, there are additional conservative substitutions representing exchanges between two interacting amino acids, such as a small-large pair changing to a large-small pair substitutions that are not individually so conservative. Here we show that including information for such pairs of substitutions yields improved sequence matches, and that these yield significant gains in the agreements between sequence alignments and structure matches of the same protein pair. The result shows sequence segments matched where structure segments are aligned. There are gains for all 2002 collected cases where the sequence alignments that were not previously congruent with the structure matches. Our results also demonstrate a significant gain in detecting homology for "twilight zone" protein sequences. The amino acid substitution metrics derived have many other potential applications, for annotations, protein design, mutagenesis design, and empirical potential derivation.


Assuntos
Algoritmos , Substituição de Aminoácidos , Aminoácidos/química , Proteínas/química , Sequência de Aminoácidos , Aminoácidos/metabolismo , Bases de Dados de Proteínas , Conjuntos de Dados como Assunto , Humanos , Modelos Moleculares , Engenharia de Proteínas/métodos , Proteínas/metabolismo , Alinhamento de Sequência , Homologia de Sequência de Aminoácidos
9.
Ann Appl Stat ; 15(2): 902-924, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35910493

RESUMO

Measuring the dependence of k ≥ 3 random variables and drawing inference from such higher-order dependences are scientifically important yet challenging. Motivated here by protein coevolution with multivariate categorical features, we consider an information theoretic measure of higher-order dependence. The proposed collective dependence is a symmetrization of differential interaction information which generalizes the mutual information of a pair of random variables. We show that the collective dependence can be easily estimated and facilitates a test on the dependence of k ≥ 3 random variables. Upon carefully exploring the null space of collective dependence, we devise a Classification-Assisted Large scaLe inference procedure to DEtect significant k-COllective DEpendence among d ≥ k random variables, with the false discovery rate controlled. Finite sample performance of our method is examined via simulations. We apply this method to the multiple protein sequence alignment data to study the residue or position coevolution for two protein families, the elongation factor P family and the zinc knuckle family. We identify novel functional triplets of amino acid residues, whose contributions to the protein function are further investigated. These confirm that the collective dependence does yield additional information important for understanding the protein coevolution compared to the pairwise measures.

10.
Front Mol Biosci ; 7: 607323, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33614705

RESUMO

Two new computational approaches are described to aid in the design of new peptide-based drugs by evaluating ensembles of protein structures from their dynamics and through the assessing of structures using empirical contact potential. These approaches build on the concept that conformational variability can aid in the binding process and, for disordered proteins, can even facilitate the binding of more diverse ligands. This latter consideration indicates that such a design process should be less restrictive so that multiple inhibitors might be effective. The example chosen here focuses on proteins/peptides that bind to hemagglutinin (HA) to block the large-scale conformational change for activation. Variability in the conformations is considered from sets of experimental structures, or as an alternative, from their simple computed dynamics; the set of designe peptides/small proteins from the David Baker lab designed to bind to hemagglutinin, is the large set considered and is assessed with the new empirical contact potentials.

11.
Biophys J ; 112(8): 1561-1570, 2017 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-28445748

RESUMO

Protein functional mechanisms usually require conformational changes, and often there are known structures for the different conformational states. However, usually neither the origin of the driving force nor the underlying pathways for these conformational transitions is known. Exothermic chemical reactions may be an important source of forces that drive conformational changes. Here we investigate this type of force originating from ATP hydrolysis in the chaperonin GroEL, by applying forces originating from the chemical reaction. Specifically, we apply directed forces to drive the GroEL conformational changes and learn that there is a highly specific direction for applied forces to drive the closed form to the open form. For this purpose, we utilize coarse-grained elastic network models. Principal component analysis on 38 GroEL experimental structures yields the most important motions, and these are used in structural interpolation for the construction of a coarse-grained free energy landscape. In addition, we investigate a more random application of forces with a Monte Carlo method and demonstrate pathways for the closed-open conformational transition in both directions by computing trajectories that are shown upon the free energy landscape. Initial root mean square deviation (RMSD) between the open and closed forms of the subunit is 14.7 Å and final forms from our simulations reach an average RMSD of 3.6 Å from the target forms, closely matching the level of resolution of the coarse-grained model.


Assuntos
Trifosfato de Adenosina/química , Proteínas de Bactérias/química , Chaperonina 60/química , Trifosfato de Adenosina/metabolismo , Proteínas de Bactérias/metabolismo , Chaperonina 60/metabolismo , Simulação por Computador , Escherichia coli , Hidrólise , Modelos Químicos , Modelos Moleculares , Método de Monte Carlo , Paracoccus denitrificans , Análise de Componente Principal , Conformação Proteica , Thermus thermophilus
12.
Proc Natl Acad Sci U S A ; 114(11): 2928-2933, 2017 03 14.
Artigo em Inglês | MEDLINE | ID: mdl-28265078

RESUMO

Evaluating protein structures requires reliable free energies with good estimates of both potential energies and entropies. Although there are many demonstrated successes from using knowledge-based potential energies, computing entropies of proteins has lagged far behind. Here we take an entirely different approach and evaluate knowledge-based conformational entropies of proteins based on the observed frequencies of contact changes between amino acids in a set of 167 diverse proteins, each of which has two alternative structures. The results show that charged and polar interactions break more often than hydrophobic pairs. This pattern correlates strongly with the average solvent exposure of amino acids in globular proteins, as well as with polarity indices and the sizes of the amino acids. Knowledge-based entropies are derived by using the inverse Boltzmann relationship, in a manner analogous to the way that knowledge-based potentials have been extracted. Including these new knowledge-based entropies almost doubles the performance of knowledge-based potentials in selecting the native protein structures from decoy sets. Beyond the overall energy-entropy compensation, a similar compensation is seen for individual pairs of interacting amino acids. The entropies in this report have immediate applications for 3D structure prediction, protein model assessment, and protein engineering and design.


Assuntos
Entropia , Conformação Proteica , Proteínas/química , Aminoácidos/química , Interações Hidrofóbicas e Hidrofílicas , Dobramento de Proteína , Solventes/química
13.
J Mol Biol ; 428(5 Pt A): 802-810, 2016 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-26687034

RESUMO

The essential aspects of the ribosome's mechanism can be extracted from coarse-grained simulations, including the ratchet motion, the movement together of critical bases at the decoding center, and movements of the peptide tunnel lining that assist in the expulsion of the synthesized peptide. Because of its large size, coarse graining helps to simplify and to aid in the understanding of its mechanism. Results presented here utilize coarse-grained elastic network modeling to extract the dynamics, and both RNAs and proteins are coarse grained. We review our previous results, showing the well-known ratchet motions and the motions in the peptide tunnel and in the mRNA tunnel. The motions of the lining of the peptide tunnel appear to assist in the expulsion of the growing peptide chain, and clamps at the ends of the mRNA tunnel with three proteins ensure that the mRNA is held tightly during decoding and essential for the helicase activity at the entrance. The entry clamp may also assist in base recognition to ensure proper selection of the incoming tRNA. The overall precision of the ribosome machine-like motions is remarkable.


Assuntos
Modelos Moleculares , RNA Mensageiro/química , Ribossomos/química , Peptídeos/química , Conformação Proteica , RNA de Transferência/química , Proteínas Ribossômicas/química , Thermus thermophilus/química
14.
Methods Mol Biol ; 1215: 213-36, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25330965

RESUMO

The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high-for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods-Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them.


Assuntos
Protease de HIV/química , Modelos Moleculares , Bases de Dados de Proteínas , Entropia , Humanos , Análise de Componente Principal
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...